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Abstract 

We show a new lower bound on the sample complexity of (e, ^-differentially private 
algorithms that accurately answer statistical queries on high-dimensional databases. The 
novelty of our bound is that it depends optimally on the parameter b, which loosely corre¬ 
sponds to the probability that the algorithm fails to be private, and is the first to smoothly 
interpolate between approximate differential privacy (<5 > 0) and pure differential privacy 
(-5 = 0). 

Specifically, we consider a database D e j+l}" xrf and its one-way marginals, which are 
the d queries of the form "What fraction of individual records have the z-th bit set to +1?" 
We show that in order to answer all of these queries to within error ±a (on average) while 
satisfying (e, ^-differential privacy, it is necessary that 


n > Q 


4~d log(l/<5) 
ae 


which is optimal up to constant factors. To prove our lower bound, we build on the con¬ 
nection between fingerprinting codes and lower bounds in differential privacy (Bun, Ullman, 
and Vadhan, STOC'14). 

In addition to our lower bound, we give new purely and approximately differentially 
private algorithms for answering arbitrary statistical queries that improve on the sample 
complexity of the standard Taplace and Gaussian mechanisms for achieving worst-case ac¬ 
curacy guarantees by a logarithmic factor. 
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1 Introduction 


The goal of privacy-preserving data analysis is to enable rich statistical analysis of a database 
while protecting the privacy of individuals whose data is in the database. A formal privacy 
guarantee is given by (e, 8)-differential privacy [DMNS06, DKM + 06], which ensures that no in¬ 
dividual's data has a significant influence on the information released about the database. The 
two parameters £ and 8 control the level of privacy. Very roughly, £ is an upper bound on 
the amount of influence an individual's record has on the information released and 8 is the 
probability that this bound fails to hold 1 , so the definition becomes more stringent as e, 8 —> 0. 

A natural way to measure the tradeoff between privacy and utility is sample complexity — 
the minimum number of records n that is sufficient in order to publicly release a given set 
of statistics about the database, while achieving both differential privacy and statistical accu¬ 
racy. Intuitively, it's easier to achieve these two goals when n is large, as each individual's data 
will have only a small influence on the aggregate statistics of interest. Conversely, the sample 
complexity n should increase as £ and 8 decrease (which strengthens the privacy guarantee). 

The strongest version of differential privacy, in which 8 — 0, is known as pure differential 
privacy. The sample complexity of achieving pure differential privacy is well known for many 
settings (e.g. [HT10]). The more general case where 8 > 0 is known as approximate differential 
privacy, and is less well understood. Recently, Bun, Ullman, and Vadhan [BUV14] showed how 
to prove strong lower bounds for approximate differential privacy that are essentially optimal 
for 8 x l/n, which is essentially the weakest privacy guarantee that is still meaningful. 2 

Since 8 bounds the probability of a complete privacy breach, we would like <5 to be very 
small. Thus we would like to quantify the cost (in terms of sample complexity) as 8 —> 0. In 
this work we give lower bounds for approximately differentially private algorithms that are 
nearly optimal for every choice of 8, and smoothly interpolate between pure and approximate 
differential privacy. 

Specifically, we consider algorithms that compute the one-way marginals of the database —an 
extremely simple and fundamental family of queries. For a database D 6 {±l} nxd , the d one-way 
marginals are simply the mean of the bits in each of the d columns. Formally, we define 

— I n 

D:=-V Di e[±l] d 
n L—< 

;=i 

where D, € {±l} rf is the z’-th row of D. A mechanism M is said to be accurate if, on input D, its 
output is "close to" D. Accuracy may be measured in a worst-case sense—i.e. ||M(D)-D|| < a, 

meaning every one-way marginal is answered with accuracy a —or in an average-case sense— 
i.e. ||M(D)-D|| < ad, meaning the marginals are answered with average accuracy a. 

Some of the earliest results in differential privacy [DN03, DN04, BDMN05, DMNS06] give a 
simple (£, d)-differentially private algorithm—the Laplace mechanism —that computes the one¬ 
way marginals of D 6 {+l} ,!xrf with average error a as long as 

f . f yjd\o%{\/8) d 
n>U mm(-, — 

I £a £a 


^This intuition is actually somewhat imprecise, although it is suitable for this informal discussion. See [KS08] 
for a more precise semantic interpretation of (e, (^-differential privacy. 

2 When 5 > 1 In there are algorithms that are intuitively not private, yet satisfy (0, ^-differential privacy. 
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The previous best lower bounds are n > Q (d/ea) [HT10] for pure differential privacy and n > 
Cl(Vd/ea) for approximate differential privacy with <5 = o(l/n) [BUV14]. Our main result is an 
optimal lower bound that combines the previous lower bounds. 

Theorem 1.1 (Main Theorem). For every e < 0(1), every 2~ Q (") < S < l/n 1 +Q d) an d ever y 
a < 1/10, if M : {±l}' ixd —> [±l] d is (e,6)-differentially private and E[||M(D)-DllJ < ad, then 


n> Q 


y]d log(l/<5)' 


ea 


More generally, this is the first result showing that the sample complexity must grow by a 
multiplicative factor of -^/log( l/<5) for answering any family of queries, as opposed to an additive 
dependence on 5. We also remark that the assumption on the range of S is necessary, as the 
Laplace mechanism gives accuracy a and satisfies (e, 0 (-differential privacy when n > 0(d/ea). 


1.1 Average-Case Versus Worst-Case Error 

Our lower bound holds for mechanisms with an average-case (Lj) error guarantee. Thus, it 
also holds for algorithms that achieve worst-case (L 0 0 ) error guarantees. The Laplace mech¬ 
anism gives a matching upper bound for average-case error. In many cases worst-case error 
guarantees are preferrable. For worst-case error, the sample complexity of the Laplace mecha¬ 
nism degrades by an additional logd factor compared to (1). 

Surprisingly, this degradation is not necessary. We present algorithms that answer every 
one-way marginal with a accuracy and improve on the sample complexity of the Laplace mech¬ 
anism by roughly a logd factor. These algorithms demonstrate that the widely used technique 
of adding independent noise to each query is suboptimal when the goal is to achieve worst-case 
error guarantees. 

Our algorithm for pure differential privacy satisfies the following. 

Theorem 1.2. For every e, a > 0, d > 1, and n > Ad/ea, there exists an efficient mechanism M : 
{ ± lj/;xd [+l] rf that is (e,0)-differentially private and 

VD e{±l} nxd P[||M(D)-D|| oo >a] <{2e)~ d . 

And our algorithm for approximate differential privacy is as follows. 

Theorem 1.3. For every e, b, a > 0, d > 1, and 

( yjd ■ log(l/d) • log log d 

n > O - - , 

ea 

there exists an efficient mechanism M : {±l} nxd —> [±l] d that is (e,6)~differentially private and 

VDel+ir* p[||M(D)-D|L> 0 ]<A_ 

These algorithms improve over the sample complexity of the best known mechanisms for 
each privacy and accuracy guarantee by a factor of (log(d)) Q ^. Namely, the Laplace mecha¬ 
nism requires n > 0(d ■ log d/ea) samples for pure differential privacy and the Gaussian mech¬ 
anism requires n > 0{y]d ■ log(l/<$) • log d/ea) samples for approximate differential privacy. 


2 













Privacy 

Accuracy 

Type 

Previous bound 

This work 

(e,6) 

Lj or L m 

Lower 


[BUV14] 


(e,d) 

Li 

Upper 


Laplace 


(e,6) 

Loo 

Upper 

n _ q| y/d-log(l/6)-\ogd j 

Gaussian 

,, _ q | V4-log(l/<5)-loglogd j 

M) 

L l or L m 

Lower 

» = G(£) 

[HT10] 


M) 

u 

Upper 

H8 

o 

ii 

Si 

Laplace 


M) 

Loo 

Upper 


Laplace 



Figure 1: Summary of sample complexity upper and lower bounds for privately answering d 
one-way marginals with accuracy a. 


1.2 Techniques 

Lower Bounds: Our lower bound relies on a combinatorial objected called a fingerprinting 
code [BS98]. Fingerprinting codes were originally used in cryptography for watermarking digi¬ 
tal content, but several recent works have shown they are intimately connected to lower bounds 
for differential privacy and related learning problems [U1113, BUV14, FIU14, SU14], In partic¬ 
ular, Bun et al. [BUV14] showed that fingerprinting codes can be used to construct an attack 
demonstrating that any mechanism that accurately answers one-way marginals is not differen¬ 
tially private. Specifically, a fingerprinting code gives a distribution on individuals' data and a 
corresponding "tracer" algorithm such that, if a database is constructed from the data of a fixed 
subset of the individuals, then the tracer algorithm can identify at least one of the individu¬ 
als in that subset given only approximate answers to the one-way marginals of the database. 
Specifically, their attack shows that a mechanism that satisfies (l,o(l/n))-differential privacy 
requires n > Cl(yfd) samples to accurately compute one-way marginals. 

Our proof uses a new, more general reduction from breaking fingerprinting codes to dif¬ 
ferentially private data release. Specifically, our reduction uses group differential privacy. This 
property states that if an algorithm is (£,<5)-differentially private with respect to the change of 
one individual's data, then for any k, it is roughly (ke, e fc£ <5)-differentially private with respect 
to the change of k individuals' data. Thus an (e, <$)-differentially private algorithm provides a 
meaningful privacy guarantee for groups of size k w log(l/b)/£. 

To use this in our reduction, we start with a mechanism M that takes a database of n rows 
and is (e, b)-differentially private. We design a mechanism Mj. that takes a database of n/k rows, 
copies each of its rows k times, and uses the result as input to M. The resulting mechanism 
is roughly (ke,e ke 6)- differentially private. For our choice of k, these parameters will be small 
enough to apply the attack of [BUV14] to obtain a lower bound on the number of samples used 
by M/ c , which is n/k. Thus, for larger values of k (equivalently, smaller values of 5), we obtain a 
stronger lower bound. The remainder of the proof is to quantify the parameters precisely. 

Upper Bounds: Our algorithm for pure differential privacy and worst-case error is an in¬ 
stantiation of the exponential mechanism [MT07] using the norm. That is, the mechanism 


3 

























samples y € IR rf with probability proportional to exp(-?/ ||y|| ) and outputs M(D) - D + y. In 
contrast, adding independent Laplace noise corresponds to using the exponential mechanism 
with the Li norm and adding independent Gaussian noise corresponds to using the exponen¬ 
tial mechanism with the L 2 norm squared. Using this distribution turns out to give better tail 
bounds than adding independent noise. 

For approximate differential privacy, we use a completely different algorithm. We start by 
adding independent Gaussian noise to each marginal. However, rather than using a union 
bound to show that each Gaussian error is small with high probability, we use a Chernoff 
bound to show that most errors are small. Namely, with the sample complexity that we allow 
M, we can ensure that all but a l/polylog(d) fraction of the errors are small. Now we "fix" 
the d/polylog(d) marginals that are bad. The trick is that we use the sparse vector algorithm, 
which allows us to do indentify and fix these d/polylog(d) marginals with sample complexity 
corresponding to only d/polylog(d) queries, rather than d queries. 

2 Preliminaries 

We define a database D e [±l} nxd to be a matrix of n rows, where each row corresponds to an 
individual, and each row has dimension d (consists of d binary attributes). We say that two 
databases D,D' e {±l} nxd are adjacent if they differ only by a single row, and we denote this by 
D ~ D'. In particular, we can replace the zth row of a database D with some fixed element of 
{±l} d to obtain another database D_ z - ~ D. 

Definition 2.1 (Differential Privacy [DMNS06]). Let M : {±l} nxd — ■> 1Z be a randomized mech¬ 
anism. We say that M is (e, b)-differentially private if for every two adjacent databases D ~ D' 
and every subset S C 1Z, 

IP[M(D) € S] < e £ ■ P [M(D') 6 S] + 6. 

A well known fact about differential privacy is that it generalizes smoothly to databases 
that differ on more than a single row. We say that two databases D,D' e {±l} nxd are k-adjacent 
if they differ by at most k rows, and we denote this by D D’. 

Fact 2.2 (Group Differential Privacy). For every k> 1, if M : {±l} nxd —> 1Z is ( e,b)-differentially 
private, then for every two k-adjacent databases D D', and every subset S ClZ, 

P[M(D) 6 S]<e kE -lP[M(D') eS] + ■ 6. 

All of the upper and lower bounds for one-way marginals have a multiplicative l/a£ de¬ 
pendence on the accuracy a and the privacy loss e. This is no coincidence - there is a generic 
reduction: 

Fact 2.3 (a and e dependence). Let p e [l,oo] and a,e,6 e [0,1/10]. 

Suppose there exists a {e, b)-differentially private mechanism M : [±l} nxd —» [±l] d such that for 
every database D e {±l} nxd , 

e[||M(D)-D|| p ] < ad 1/p . 

Then there exists a (1, 6/e)-differentially private mechanism M' : [±l} nxd —> [±l] rf for n' = 
®(aen) such that for every database D’ 6 {±1}” xd , 

eJ||M / (D , )-D 7 || /j ] < d 1/p /10. 
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This fact allows us to suppress the accuracy parameter a and the privacy loss e when prov¬ 
ing our lower bounds. Namely, if we prove a lower bound of n' > n* for all (l,<$)-differentially 
private mechanisms M' : {±l} nxd —> [±l] rf with E [jlM^D') - D'||p] < d 1/p /10, then we obtain 

a lower bound of n > Q (n*/ae) for all (e, £d)-differentially private mechanisms M : {±l} nxd —> 
[±l] rf withE[||M(D)-D|| p ] < ad 1/ P. So we will simply fix the parameters a = 1/10 and e - 1 in 
our lower bounds. 

3 Lower Bounds for Approximate Differential Privacy 

Our main theorem can be stated as follows. 

Theorem 3.1 (Main Theorem). Let M : {±l} nxd —> [±l\ d be a (1, b)-differentially private mecha¬ 
nism that answers one-way marginals such that 

VDe{±l}” xd 

where D is the true answer vector. If < S < 1 /« 1+Q d) anc i n ( s sufficiently large, then 

d < ol —--). 

Theorem 1.1 in the introduction follows by rearranging terms, and applying Fact 2.3. The 
statement above is more convenient technically, but the statement in the introduction is more 
consistent with the literature. 

First we must introduce fingerprinting codes. The following definition is tailored to the ap¬ 
plication to privacy. Fingerprinting codes were originally defined by Boneh and Shaw [BS98] 
with a worst-case accuracy guarantee. Subsequent works [BUV14, SUM] have altered the ac¬ 
curacy guarantee to an average-case one, which we use here. 

Definition 3.2 (Lj Fingerprinting Code). A e-complete b-sound a-robust Lj fingerprinting code 
for n users with length d is a pair of random variables D 6 {+l} nxd and Trace : [±l] d —> 2^ such 
that the following hold. 

Completeness: For any fixed M : {±l} nxd —> [+l] d , 

P[(||M(D)-D|| 1 < ad) A ( Trace(M(D)) = 0)] < £. 

Soundness: For any i e [n\ and fixed M : {±l} nxd —> [±l] d , 

P[i 6 Trace(M{D_j))] < 6, 

where D„ ; - denotes D with the z’ th row replaced by some fixed element of {±l} d . 

Fingerprinting codes with optimal length were first constructed by Tardos [Tar08] (for 
worst-case error) and subsequent works [BUV14, SUM] have adapted Tardos' construction to 
work for average-case error guarantees, which yields the following theorem. 
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Theorem 3.3. For every n > 1, S > 0, and d > d n ^ = 0(n 2 log(l/6)), there exists a 1/100 -complete 
6-sound l/S-robust Lj fingerprinting code for n users with length d. 

We now show how the existence of fingerprinting codes implies our lower bound. 

Proof of Theorem 3.1 from Theorem 3.3. Let M : {±l} nxd —> [±l\ d be a (l,6)-differentially private 
mechanism such that 

VDei+ir" eIIImidj-dIIJsA 

Then, by Markov's inequality, 


VD 6 {±l} nxd 


P 

M 


||m(d) - d]^ 



9 

< —. 
“ 10 


( 2 ) 


Let A: be a parameter to be chosen later. Let n k = \ n/k\. Let M k : {±l} n k xd —> [±l] rf be the 
following mechanism. On input D* e {±l} nkXd , M k creates D 6 {±l} nxd by taking k copies of D* 
and filling the remaining entries with Is. Then M/ c runs M on D and outputs M(D). 

By group privacy (Fact 2.2), M k is a (ejt = k, 6 k = (^-differentially private mechanism. By 
the triangle inequality, 

||m,(d*)-d^|| 1 <||m(D)-d|| 1 + ||d-d^|| 1 . (3) 


Now 


Thus 


We have 


■ k-ni.— n-k-nt 

D; =- -D +- -1. 

1 n 1 n 


D j ~ D * 


k -n k 1 W | n-k-n k 
n / i n 


n-k -nk 


1-D) 


< 2 


n-k ■ n k 


n-k ■ n k n - k[n/k\ n - k(n/k - 1) k 
n n ~ n n 


Thus ||D - D*|| < 2 k/n. Assume k < n/200. Thus ||D - < d/100 and, by (2) and (3), 


P 

M k 


d 


||M t (D*)-D*||i>- <P ||M{D) - d||, > - 


——,, d 


< —. 
- 10 


(4) 


Assume d > d„ k ^, were d nki $ = 0(«jrlog(l/<5)) is as in Theorem 3.3. We will show by contra¬ 
diction that this cannot be -that is d < 0(n 2 log(l/6)). Let D* € {+1 and Trace : [±l] d —> 2^ 
be a 1/100-complete 6-sound 1/8-robust L 1 fingerprinting code for n k users of length d. 

By the completeness of the fingerprinting code, 


P 


d 


|< — A Trace(M(D)) = 0 


< 


100 ' 


(5) 


Combinging (4) and (5), gives 


P [Trace(M k (D*))*<fi]>j^>-^. 
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In particular, there exists i* e [ n k \ such that 


P[f 6 Trace(M k (D*))]> (6) 

We have that Trace{M k {D*)) is a (e k , ^-differentially private function of D*, as it is only 
postprocessing M k (D*). Thus 

IP[f € Trace(M k (D*))] < 6 Trace(Mj.(D* i -,))| + b k < e Ek b + b k , (7) 

where the second inequality follows from the soundness of the fingerprinting code. 
Combining (6) and (7) gives 

— < e Ek b + b k = e k b + = £k+1 ~ 1 b < e k+1 b. (8) 

12 n k e-1 e -1 

If k < log(l/12 n k b) - 1, then (8) gives a contradiction. Let k = Llog(l/12nd) - 1J. Assuming 
b > e~ n/200 ensures k < n/ 200, as required. Assuming b < l/n 1+ L implies k > log(l/<5)/(1 + 1 /y) — 
5 > Q(log(l/d)). This setting of k gives a contradiction, which implies that 

d < d n k ,6 = 0(n 2 k \og(l/b)) = o|^log(l/d)J = o( ^ ), 

as required. 

□ 


4 New Mechanisms for L^ Error 

Adding independent noise seems very natural for one-way marginals, but it is suboptimal if 
one is interested in worst-case (i.e. L^) error bounds, rather than average-case (i.e. Lj) error 
bounds. 

4.1 Pure Differential Privacy 

Theorem 1.2 follows from Theorem 4.1. In particular, the mechanism M : {±l} nxd —> [±\} d in 
Theorem 1.2 is given by M(D) — D + Y, where Y ~ V and V is the distribution from Theorem 
4.1 with A = 2 /n. 3 

Theorem 4.1. For all e > 0, d > 1, and A > 0, there exists a continuous distribution V on lR d with 
the following properties. 

• Privacy: If x,x' e IR rf with ||x — ^c / || 0 o — A, then 

IP [x+YeS]<e £ P [x'+YeSl 

Y~V Y~V 

for all measurable S C ]R rf . 

3 Note that we must truncate the output of M to ensure that M(D) is always in [±1] , 
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• Accuracy: For all a > 0, 

PJUYIL >a]<(^)V“". 

In particular, if d < ea/2A, then F [HYH^ > a] < (2e)~ d . 


• Efficiency: V can he efficiently sampled. 

Proof. The distribution V is simply an instantiation of the exponential mechanism [MT07], In 
particular, the probability density function is given by 


pdMy)ocexp(-^||yL). 


Formally, for every measurable S C F rf , 


_ Js ex p(-f lblL) d y 

L ex p(-ilML) d y 


Firstly, this is clearly a well-defined distribution as long as e/A > 0. 

Privacy is easy to verify: It suffices to bound the ratio of the probability densities for the 
shifted distributions. For x,x' 6 F rf with \\x’ -xjl^ < A, by the triangle inequality, 


pdf v (x + y) 
pdf v (x' + y) 


exp (-||| x + y|L) 

ex p(-f Ik+HL) 



Define a distribution V* on [0, oo) to by Z ~ V* meaning Z = HYH^ for Y ~ T>. To prove 
accuracy, we must give a tail bound on V*. The probability density function of V* is given by 


pdfp,(z) oc z d 1 



which is obtained by integrating the probability density function of D over the infinity-ball of 
radius z, which has surface area d2 d z d ~ l (X z d ~ l . Thus V* is precisely the gamma distribution 
with shape d and mean d A/e. The moment generating function is therefore 


E 

z~v* 


e tz | = 



for all t < e/A. By Markov's inequality 


F [Z > a] < 

z~v* 


E 

z~v ' 


,tz 


,fa 



e 


-ta 


Setting t - e/A - d/a gives the required bound. 

It is easy to verify that Y ~ V can be sampled by first sampling a radius R from a gamma 
distribution with shape d + 1 and mean (d + 1)A Je and then sampling Y 6 [+R] rf uniformly 
at random. To sample R we can set R = j Hf =0 log Uj, where each 6 (0,1] is uniform and 
independent. This gives an algorithm (in the form of an explicit circuit) to sample V that uses 
only 0(d) real arithmetic operations, d +1 logarithms, and 2d +1 independent uniform samples 
from [0,1]. 

□ 
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4.2 Approximate Differential Privacy 

Our algorithm for approximate differential privacy makes use of a powerful tool from the 
literature [DNR + 09, HR10, DNPR10, RR10] called the sparse vector algorithm: 

Theorem 4.2 (Sparse Vector). For every c,k > 1, a, b,a, f > 0, and 

\/clog(l/<5)log(k//l)' 


n> O 


ae 


there exists a mechanism SV with the following properties. 

• SV takes as input a database D e X n and provides answers a\,- -- e [±1 ] to k (adaptive) 
linear queries qi,- ,q^ : X —> [+1]. 

• SV is (e,b)-differentially private. 

• Assuming 


we have 


j 6 [k]: \qj(D)\ > a/ 2j| < c, 

P [V; 6 [k] | aj - qj(D)\ <a]> 1 - f. 


A proof of this theorem can be found in [DR13, Theorem 3.28]. 4 We now describe our 
approximately differentially private mechanism. 


Parameters: e, b > 0. 

Input: D 6 {±l} nxd . 

Let 

a = 5 y]d log(l/<5)/£ n a - 8CT-y/loglogd 

For j 6 [d], let a } = Dj + Zj where Zj ~N(0,a 2 ). 

Instantiate SV from Theorem 4.2 with parameters 

Cgv — 2d/log 8 d kgy = d £gy — c/2 bgy = <5/2 
agy — all figy — e d 

For j e [d], define qj : {±l} d —> [+1] by qj(x) = (Xj - dj)/2. 

Let ,&d be the answers to q 1 ,-- ■ ,qd given by SV. 

For j 6 [d], let a } = dj + 2dj. 

Output a 1 y ,ad- 


Figure 2: Approximately DP mechanism M : [±l) nxd —» [±l] d 


4 Note that the algorithms in the literature are designed to sometimes output _L as an answer or halt prematurely. 

To modify these algorithms into the form given by Theorem 4.2 simply output 0 in these cases. 
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Proof of Theorem 1.3. Firstly, we consider the privacy of M: a is the output of the Gaussian 
mechanism with parameters to ensure that it is a (e/2, 6/2)-differentially private function of D. 
Likewise a is the output of SV with parameters to ensure that it is also a (e/2, d/2 (-differentially 
private function of D. Since the output is a + 2d, composition implies that M as a whole is 
(e, d)-differentially private, as required. 

Now we must prove accuracy. Suppose that |a ( - qj(D)\ < ag V - a/2 for all j e [d]. Then 

|aj - Dj | =|dj + 2dj - Dj | 

=1- Dj + 2(qj(D) + (dj - qj(D)))\ 

<1«; - Dj + 2qj(D)\ + 2\dj - qj(D))\ 

<|dj - Dj + (D - dj )| + 2 a$y 
-a, 


as required. So we need only show that | dj - qj(D)\ < a$y for all j e [d], which sparse vector 
guarantees will happen with probability at least 1 - fsv as l° n 8 as 


j e [d] ■ \qj(D)\ > a S y/2j| < c sv . 

Now we verify that (9) holds with high probability. 

By our setting of parameters, we have qj(D) = -Zj/2. This means 


(9) 


p[|<fy(D)| > a S y/2] = P[|z ; -| > a/2] < e 


-a 2 /8a 2 


log d 


Let Ej e {0,1} be the indicator of the event \qj(D)\ > a$y/2. Since the Zj s are independent, so are 
the E: s. Thus we can apply a Chernoff bound: 


P 


j 6 [d] ■ I qj(D)\ > a S y/2]| > c sv 




}| > c sv 

= p 


L e < 

je[d] 


> 


2d 

log 8 d 


< e 


-2d/log 16 d 


( 10 ) 


The failure probability of M is bounded by the failure probability of SV plus (10), which is 
dominated by f§ v = exp(-log 4 d). 

Finally we consider the sample complexity. The accuracy is bounded by 


which rearranges to 

Theorem 4.2 requires 


a < 


n > 


40 yfd ■ log(l/d) • loglogd 


eh 


40 y]d • log(l/<5) -loglogd 


as 


1 V c Svlog(l/d)log(d/^sv) 

- O 

^d log(l/d) 

ae 




n > O 

for sparse vector to work, which is also satisfied. 

We remark that we have not attempted to optimize the constant factors in this analysis. 


□ 
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A Alternative Lower Bound for Pure Differential Privacy 

It is known [HT10] that any £-differentially private mechanism that answers d one-way marginals 
requires n > Q (d/e) samples. Our techniques yield an alternative simple proof of this fact. 

Theorem A.l. Let M : {±l} nxd —> [+1 ] d be a e-differentially private mechanism. Suppose 


VDe{± l} nxd 1E| 


: [\\M(D)-D\\ 1 ]<0.9d 

Then n > Q (d/e). 

The proof uses a special case of Hoeffding's Inequality: 

Lemma A.2 (Hoeffding's Inequality). Let X e {±1}" be uniformly random and a e P” fixed. Then 

TP[(a,X)>A\\a\\ 2 ]<e- x2/2 

A 

for all A > 0. 

Proof of Theorem A.l. Let x,x' e {±l} d be independent and uniform. Let D 6 {±l} nxd be n copies 
of x and, likewise, let D’ e {±l} nxd be n copies of x'. Let Z = ( M(D),x ) and Z' = (M(D'),x). 

Now we give conflicting tail bounds for Z and Z', which we can relate by privacy. 

By our hypothesis and Markov's inequality, 

P[Z < d/20] =P[<M(D),x> < 0.05d] 

=p[(D,x)-(D-M(D),x> < 0.05d] 

=p[(D-M(D),x)>0.95d] 

<P[||D-M(D)|| 1 > 0.95d] 

^p[||D-M(D)|| 1 ] 

“ 0.95 d 

0.9 

<-< 0.95. 

0.95 
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Since M(D') is independent from x, we have 

VA > 0 p[z' > AVd] <p[(M(D'),x) > A||M(D / )|| 2 ] < e~ x2/1 , 

by Lemma A.2. In particular, setting A = Vd/20 gives P[Z' > d/20] < e~ d/800 . 
Now D and D' are databases that differ in n rows, so privacy implies that 

P[M(D) 6 S] < e M P[Al(D') 6 S] 


for all S. Thus 


20 


<P 


Z > 


20 


P [M(D) e S x ] < e ne F [M(D') e S x ] = e nE P 


z’> — 
20 


< e 


where 


Rearranging 1/20 < e nE e d/800 ) gives 


S x = \ye[±l] a — 


n > 


d log(20) 


800£ e 


as required. 


-d/800 

^ r 


□ 
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